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DETAILED ACTION 
Specification 

The disclosure is objected to because of the following informalities: On line 7 of Page 2, 
"at one" should be "at least one". On line 6 of claim 18, page 42, "ending" should be "sending". 
Appropriate correction is required. 

Claim Objections 

Claim 18 is objected to because of the following informalities: On line 6 of claim 18, 
page 42, "ending" should be "sending". Appropriate correction is required. 



Claim Rejections - 35 USC §102 
The following is a quotation of the appropriate paragraphs of 35 U.S.C 102 that form the 
basis for the rejections under this section made in this Office action: 
A person shall be entitled to a patent unless - 

(e) the invention was described in (1) an application for patent, published under section 1 22(b), by another filed 
in the United States before the invention by the applicant for patent or (2) a patent granted on an application for 
patent by another filed in the United States before the invention by the applicant for patent, except that an 
international application filed under the treaty defined in section 35 1 (a) shall have the effects for purposes of this 
subsection of an application filed in the United States only if the international application designated the United 
States and was published under Article 21(2) of such treaty in the English language. 

Claims 1-9 and 12-20 are rejected under 35 U.S.C. 102(e) as being anticipated by Cosatto 
et al. U.S. Patent No. 6,504,546 (hereinafter Cosatto). 

Reclaims 1, 12, 18: 

Cosatto discloses a method for creating a virtual video, comprising at least one of steps 

a)-d): 
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a) Sending an image of an object to a receiver via an information line (e.g., the receiver 
being a low-cost PC; column 15, lines 5-10 and an information line refers to the communication 
line between the low cost PC and the image bitmap database; column 14, lines 63-67), said 
image having a plurality of identifiable image points, said plurality of identifiable image points 
(e.g., samples for the bitmap of the facial parts are fewer than the remaining bitmap image 
points and the samples correspond to the object points of the facial object) being substantially 
fewer in number than a number of remaining image points of said image points of said image, 
said object having a plurality of identifiable object points, and said plurality of identifiable image 
points corresponding to said plurality of identifiable object points (e.g., object refers to a three- 
dimensional object such as a talking person or the talking head of the base face; and the image 
refers to a bitmap of a facial part. In the case of modeling a human face, the set of three- 
dimensional planes correspond to a set of pre-defined facial parts and these bitmaps are then 
normalized and parameterized before being entered into a database. For the synthesis of a 
human head, a text-to-speech synthesizer provides the audio track, as well as a phoneme string 
and trajectory which computes motion for all the facial parts including the whole head. These 
trajectories provide the parameters for selecting the proper bitmaps from the database; see 
column 3, lines 27-53)\ 

b) Determining object position data of said plurality of identifiable object points on said 
object (e.g., Fig. 24 and Table 2 list various identifiable object points on the grid. Moreover, to 
create a video animation at 30 frames per second, the trajectory is sampled every 33.33 
milliseconds and for each sample point, the closest grid entry and its associated bitmap is chosen 
and the parameters describing feature shapes are chosen such that transitions between 
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neighboring samples look smooth; column 13, lines 22-34; A frame of the final animation can be 
generated when bitmaps of all the face parts have been retrieved from the database and the 
bitmap of the base face is first copied into the frame buffer, then the bitmaps of face parts are 
projected onto the base face using the 3D model and the pose and the whole frame is rendered 
with just a few texture-map operations which makes it possible to render the talking head in real 
time on a low-cost PC; column 14, lines 62 to column 15, lines 9); 

c) Sending said object position data to said receiver via an information line {e.g., the 
receiver being a low-cost PC; column 15, lines 5-10 and an information line refers to the 
communication line between the low cost PC and the image bitmap database; column 14, lines 
63-67); and 

d) Morphing or warping said image such that image position data of said identifiable 
image points of said image are adjusted to approximately correspond to said object position data 
(Morphing is discussed in the Background of Invention and the cited reference discloses that it is 
sufficient to use warping or alpha blending said image such that image position data of said 
identifiable image points of said image are adjusted to approximately correspond to said object 
position data for the purpose of computational saving; see column 13, lines 55-67. The cited 
reference teaches that morphing provides better results. During the transition interval, the 
resulting pixel is a blend of the corresponding pixels from sample a and sample b. The number of 
samples that are used to create a transition varies depending on the sampling rate of the 
trajectory and the duration of the samples. When the database contains few samples, the visual 
difference between samples is larger and more sophisticated techniques such as morphing 
provide better results. In column 14, the cited reference discloses that instead of directly 



Application/Control Number: 10/764,557 Page 5 

Art Unit: 2672 

mapping a phoneme to a viseme, each parameter of a viseme is derived from a sequence of 
phonemes and this generic model for coarticulation can be converted to a data-driven model and 
to synthesize new articulations of speech, the appropriate phoneme sequences are identified in 
the coarticulation database and are then concatenated. 

Although the cited reference teaches warping or cheaper blending technique, it also 
teaches the claim limitation of "morphing" by disclosing the warping technique and the texture 
mapping technique for blending the image bitmaps and the base face model. The cited reference 
further discloses using morphing of the image bitmaps and the base face model to provide better 
results when the database contains few samples. Moreover, morphing has been extensively 
discussed in the Background of Invention. The cited reference teaches that morphing, warping 
and alpha blending for the texture mapping are the appropriate technique for smoothing and 
blending applied to the strings of bitmaps to eliminate hard transitions and create a seamless 
animation for each facial part (column 3, lines 34-53 and Fig. 5, column 6, lines 7-20; column 7, 
lines 40-61). In column 7, lines 40-61, the cited reference further discloses a morphological 
operation followed by adaptive thresholding to result in a binary image where areas of facial 
features are marked with blobs of black pixels. 

Claim 2: 

Cosatto further discloses morphing said image bitmaps such that image position data of 
said remaining image points are adjusted depending on said object position data (column 7, lines 
50-61 and column 11, column 14, line 62 to column 15, lines 9). 

Claim 3: 
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Cosatto further discloses a face of a person and said plurality of identifiable object points 
comprises at least one of the following features including an eye, a nostril, an eyebrow, and a 
mouth (column 7, lines 35-40, and column 14, lines 53-61). 

Claim 4: 

Cosatto further discloses three-dimensional object position data of a talking head (Fig. 24 
and Table 2 and column 11). 
Claim 5: 

Cosatto further discloses the animation of the remaining facial parts including jaw, eyes, 
forehead and eyebrows and identifying and determining the remaining facial parts include 
identifying and determining the second identifiable image points corresponding to the second 
identifiable image points of the base face (column 14, lines 53-61). 

Claim 6: 

Cosatto further discloses the claim limitation of identifying said plurality of second 
identifiable image points at least in part by point differentiation, whereby a second identifiable 
image point is identified by differentiating said second identifiable image point from other points 
in said second image on the basis of at least one of absolute position in said second image; 
relative position compared to said other points; and magnitude/brightness (e.g., absolute 
positions are identified; see column 9, line 50 to column 10, line 17 and Table I). 

Claim 7: 

Cosatto further discloses in column 14, lines 62-67 that a frame of the final animation can 
be generated when bitmaps of all the face part have been retrieved from the database and the 
bitmap of the base face is first copied into the frame buffer and then the bitmaps of face parts are 
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projected onto the base face using the 3D model and the pose. The second image and the third 
image refer to the second bitmap and the third bitmap of the facial parts. The first frame and the 
second refer to the first frame and the second in a sequence of viseme. With regards to the 
identifiable image points, Cosatto discloses in column 10, lines 50-53 that the outline of lips, one 
of the facial parts, for example, encoded as a sequence of points and all these points are then 
mapped into the normalized plane before entering them into the database. With regards to the 
object points, Cosatto further discloses in Fig. 24 and Table 2 a list of various identifiable object 
points on the grid. With regards to the relationship between the first frame and the second, 
Cosatto discloses that, to create a video animation at 30 frames per second, the trajectory is 
sampled every 33.33 milliseconds and for each sample point, the closest grid entry and its 
associated bitmap is chosen and the parameters describing feature shapes are chosen such that 
transitions between neighboring samples look smooth; column 13, lines 22-34; A frame of the 
final animation can be generated when bitmaps of all the face parts have been retrieved from the 
database and the bitmap of the base face is first copied into the frame buffer, then the bitmaps of 
face parts are projected onto the base face using the 3D model and the pose and the whole frame 
is rendered with just a few texture-map operations which makes it possible to render the talking 
head in real time on a low-cost PC; column 14, lines 62 to column 15, lines 9. 
Claim 8: 

Cosatto further discloses that the number of samples that are used to crate a transition 
varies depending on the sampling rate of the trajectory and the duration of the samples (column 
13, lines 55-67). 

Claim 9: 
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Cosatto further discloses capturing tens of thousands of video frames (column 15, lines 
30-46), training a set of 300 frames (column 8, lines 50-55) and using a variety of frame rates 
including a rate of at least 5 times per second (column 6, line 66 to column 7, line 13). 

Re Claim 13: 

Cosatto further discloses viewing a facial image as viseme (column 15, lines 10-19) and 
marking an area by the color analysis as a candidate of a face area combined with candidates of 
eye areas produced by the texture analysis (column 7, lines 50-61) and marking on the shape of 
the lips of the current phoneme being uttered (column 14) and mapping a phoneme to a viseme 
(column 14). 

Re Claim 14: 

Cosatto further discloses that the bitmaps of face parts are projected onto the base face 
using the 3D model and the pose and the whole frame is rendered with just a few texture-map 
operations which makes it possible to render the talking head in real time on a low-cost PC; 
column 14, lines 62 to column 15, lines 9 and thereby Cosatto discloses that identifying mouth 
image position data being performed automatically by a computer processor. 

Reclaims 15 and 20: 

Cosatto discloses capturing accurately realistic speech postures, human subjects speaking 
short text sequences in front of a camera and automatically analyzing the video footage by the 
face recognition system and selecting the proper samples and extracting the needed bitmaps from 
video frames and synthesizing the talking head animation to create the photo-realistic talking 
head (column 4, lines 10-22). Cosatto discloses mapping a phoneme to a viseme (column 14) and 
using the text-to-speech synthesizer to drive the entire animation to create a talking head (column 
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15). Cosatto further discloses that morphing, warping and alpha blending for the texture mapping 
are the appropriate technique for smoothing and blending applied to the strings of bitmaps to. 
eliminate hard transitions and create a seamless animation for each facial part (column 3, lines 
34-53 and Fig. 5, column 6, lines 7-20; column 7, lines 40-61). In column 7, lines 40-61, the 
cited reference further discloses a morphological operation followed by adaptive thresholding to 
result in a binary image where areas of facial features are marked with blobs of black pixels. 
Re Claim 16: 

Cosatto further discloses morphing the remaining facial parts such as jaw, eyes, forehead 
and eyebrows (column 14). 
Re Claim 17: 

Cosatto further discloses capturing tens of thousands of video frames (column 15, lines 
30-46), training a set of 300 frames (column 8, lines 50-55) and using a variety of frame rates 
including a rate of at least 5 times per second (column 6, line 66 to column 7, line 13). Cosatto 
discloses displaying a virtual video of a talking head (column 15). 

ReClaim 19: 

Cosatto discloses high-resolution animation involving the short sequences for the base 
face totaling about 3MB compressed using MPEG 2 and the facial parts including jaw, eyes, 
forehead and eyebrows of 5 kB for each sample with a total of 40 samples and 48 mouth samples 
to create the sound face image (column 11, line 39 to column 12, line 5). 



Claim Rejections - 35 USC § 103 
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The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in 
section 102 of this title, if the differences between the subject matter sought to be patented and the prior art are 
such that the subject matter as a whole would have been obvious at the time the invention was made to a person 
having ordinary skill in the art to which said subject matter pertains. Patentability shall not be negatived by the 
manner in which the invention was made. 

Claim 10 is rejected under 35 U.S.C. 103(a) as being unpatentable over Cosatto et al. 
U.S. Patent No. 6,504,546 (hereinafter Cosatto) in view of Hayashi U.S. Patent No. 5,652,670 
(hereinafter Hayashi). 

Cosatto further discloses recording a person's posture using cameras (column 6, lines 50- 
65) and using the 3D scanning techniques such as a Cyber Ware range scanner (column 1, lines 
50 T 65). Cosatto is silent to using the laser based scanners and cameras. However, Hayashi 
discloses a laser scanner (See Hayashi the Abstract). It would have been obvious to have used 
Hayashi' s laser scanner for taking a person's facial image because Cosatto has taught using a 
Cyber Ware range scanner or an optical scanner (column 1, lines 50-65) which may be a laser 
scanner by itself, or if not, alternatively using Hayashi's laser scanner because at the time of 
invention, a laser scanner is available for taking a person's facial image. One of the ordinary skill 
in the art would have been motivated do incorporate an optical scanner such as a laser scanner 
for taking a person's facial image using a compact scanner for cost reduction (Hayashi column 
!)• 

Claim 1 1 is rejected under 35 U.S.C. 103(a) as being unpatentable over Cosatto et al. 
U.S. Patent No. 6,504,546 (hereinafter Cosatto). 
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Cosatto further discloses machine-executable code (Table 3 and column 12, lines 53-57) 
to cause a machine (PC) to perform the method as in claim 1 . Cosatto however is silent to a 
computer-usable medium However, one of ordinary skill in the art would have recognized that 
computer usable medium (i.e., floppy, cd-rom, etc.) carrying computer-executable instructions 
for implementing a method, because it would facilitate the transporting and installing of the 
method on other systems, is generally well-known in the art. For example, a copy of the 
Microsoft Windows operating system can be found on a cd-rom from which Windows can be 
installed onto other systems, which is a lot easier than running a long cable or hand typing the 
software onto another system. The Office takes Official Notice of this teaching. Therefore, it 
would have been obvious to put Cosatto's program or algorithm on a computer readable medium, 
because it would facilitate the transporting, installing and implementing of Cosatto's program or 
algorithm on other systems. 

Conclusion 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Jin-Cheng Wang whose telephone number is (571) 272-7665. 
The examiner can normally be reached on 8:00 - 6:30 (Mon-Thu). 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Mike Razavi can be reached on (571) 272-7664. The fax phone number for the 
organization where this application or proceeding is assigned is 703-872-9306. 
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Information regarding the status of an application may be obtained from the Patent 
Application Information Retrieval (PAIR) system Status information for published applications 
may be obtained from either Private PAIR or Public PAIR. Status information for unpublished 
• applications is available through Private PAIR only. For more information about the PAIR 
system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR 
system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). 
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